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Surrogate Test to Distinguish between Chaotic and Pseudoperiodic Time Series 



in 
o 
o 

(N 
C 

a 



Q 

U 
d 



> 

o 
o 
o 



> 

• 1— I 

x 

S3 



Xiaodong LuoQ Tomomichi Nakamura, and Michael Small 
Department of Electronic and Information Engineering, 
Hong Kong Polytechnic University, Hung Horn, Hong Kong. 
(Dated: February 8, 2008) 

In this communication a new algorithm is proposed to produce surrogates for pseudoperiodic time 
series. By imposing a few constraints on the noise components of pseudoperiodic data sets, we devise 
an effective method to generate surrogates. Unlike other algorithms, this method properly copes with 
pseudoperiodic orbits contaminated with linear colored observational noise. We will demonstrate the 
ability of this algorithm to distinguish chaotic orbits from pseudoperiodic orbits through simulation 
data sets from the Rossler system. As an example of application of this algorithm, we will also 
employ it to investigate a human electrocardiogram (ECG) record. 

PACS numbers: 05.45.-a 



I. INTRODUCTION 

Surrogate tests Q are examples of Monte Carlo hy- 
pothesis tests Taking the surrogate test of nonlin- 
earity in a time series [lj as an example, we first need 
to adopt a null hypothesis, which usually supposes the 
time series is generated by a linear stochastic process 
and potentially filtered by a nonlinear filter 5]. Based 
on this null hypothesis, a large number of data sets (sur- 
rogates) are to be produced from the original time series, 
which keeps the linearity of the original time series but 
destroys all other structures. We then calculate some 
nonlinear statistics (discriminating statistics), for exam- 
ple, correlation dimension, of both the original time series 
and the surrogates. If the discriminating statistic of the 
original time series deviates from those of the surrogates, 
we can reject the null hypothesis we proposed and claim 
that the original time series is deterministic with certain 
confidence level (depending on how many surrogates we 
have generated, to be shown later). In general, to apply 
the surrogate technique to test if a time series possesses 
the property P we are interested, we first select a null 
hypothesis, which assumes the time series instead has a 
property Q opposite to P. We then devise a correspond- 
ing algorithm to produce surrogates from the observed 
data set. In principle, these surrogates shall preserve the 
potential property Q while destroying all others. The 
next step is to choose a suitable discriminating statistic, 
which shall be an invariant measure for both the surro- 
gates and the original time series if the null hypothesis 
is true. Hence if the discriminating statistic of the origi- 
nal time series distinctly deviates from the distribution of 
the discriminating statistic of the surrogates, the null hy- 
pothesis is unlikely to be true, or in other words, the time 
series is much more likely to possess the property P than 
Q. In this way, we can assess the statistical significance 
of our calculations through surrogate test technique even 
when we have only a very limited amount of observa- 
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tions. Such assessments are important because in many 
practical situations statistical fluctuations are inevitable 
due to the presence of noise, hence the surrogate test is 
a proper tool to evaluate the reliability of our results in 
a statistical sense. 

In this communication, we are focused on discussing 
the algorithm to generate surrogates for pseudoperiodic 
time series. By pseudoperiodic time series we mean a 
representative of a periodic orbit perturbed by dynami- 
cal noise, or contaminated by observational noise, or with 
the combination of the both noises, whose states within 
one cycle are largely independent of those within previous 
cycles given a cycle length. Note that, in our discussions 
we will always assume we have detected that the time se- 
ries are produced from nonlinear deterministic systems, 
but they are also possibly contaminated by some noises. 
As we know, if an irregular time series comes from a 
nonlinear deterministic system, it shall be either chaotic 
or pseudoperiodic in most cases. In some situations, it 
might be important for us to discriminate between pseu- 
doperiodicity and chaos. However, chaotic and pseudope- 
riodic time series often look similar, we might not be able 
to distinguish them from each other only through visual 
inspections, quantitative techniques arc needed instead 
at this time. One choice is to apply the direct test tech- 
niques. For instance, we can calculate some characteristic 
statistics of the time series, such as the Lyapunov expo- 
nent and the correlation dimension. However, a direct 
test usually will not give out the confidence level. If we 
find the Lyapunov exponent of a time series is, for ex- 
ample, 0.01, it may be difficult for us to tell whether the 
time series is chaotic or the time series is pseudoperiodic, 
but the presence of noise causes the Lyapunov exponent 
to be slightly larger than zero. As an alternative choice, 
we suggest one utilizes the surrogate test rather than the 
direct test, which can provide us the confidence level by 
calculating a large number of surrogates. Through the 
surrogate tests, if we could exclude the possibility that 
the time series is pseudoperiodic, then the time series is 
more likely to be chaotic. This is the essential idea to 
apply our algorithm to distinguish chaos from pseudope- 
riodicity, as to be shown in section III. 
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First let us briefly review some of the algorithms to 
generate surrogates for pseudoperiodic time series. Ini- 
tially, to generate surrogates for pseudoperiodic time se- 
ries, Theiler Q proposed the cycle shuffling algorithm. 
The idea is to divide the whole data set into some seg- 
ments and let each segment contain exactly an integer 
number of cycles. The surrogates are obtained by ran- 
domly shuffling these segments, which will preserve the 
intracycle dynamics but destroy the intercycle ones by 
randomizing the temporal sequence of the individual cy- 
cles. The difficulty in applying this algorithm is that 
it requires preknowledge of the precise periodicity, other- 
wise shuffling the individual cycles might lead to spurious 
results 0- 

Recently, with the development of the cyclic theory of 
chaos [p] , many authors have shown interest in searching 
unstable periodic orbits (UPOs) in noisy data sets from 
chaotic dynamical systems. The algorithms proposed in 
essentially deal with the unstable fixed points of the 
UPOs. But as observed, the presence of noise will re- 
duce the statistical significance of these algorithms. One 
remedy is to introduce the surrogate test for reliability as- 
sessments, e.g., Dolan et.al claimed that the randomly 
shuffling surrogate algorithm p| together with the simple 
recurrence method |j| correctly tests the appropriate null 
hypothesis. Essentially, this approach is very similar to 
the cycle shuffling algorithm described previously. The 
simple recurrence algorithm is equivalent to applying a 
Poincare map on the continuous dynamical systems and 
then studying only the data points falling on the cross- 
section plane, hence one does not need to consider the 
intracycle dynamics and no knowledge of the periodicity 
is required, while randomly shuffling these data points 
exactly aims to randomize the temporal sequence of the 
cycles. However, one potential problem of this algorithm 
is that it might generate spuriously high statistical sig- 
nificance due to the correlation between the cycles @ . 

Later, Small et.al proposed the pseudoperiodic sur- 
rogate (PPS) algorithm from another viewpoint. T hey 
first apply the time delay embedding reconstruction [Ig 
to the original data set, then utilize a method based 
on local linear modelling techniques to produce surro- 
gate data which approximate the behavior of the under- 
lying dynamical system. As the authors pointed out, 
this algorithm works well even with very large dynami- 
cal noise, but it may incorrectly reject the null hypothesis 
if the intercycles of the pseudoperiodic orbit have a lin- 
ear stochastic dependence induced by colored additive 
observational noise (lo| . 

In this communication we propose a new surrogate al- 
gorithm for continuous dynamical systems, which prop- 
erly copes with linear stochastic dependence between the 
cycles of the pseudoperiodic orbits. The null hypothesis 
to be tested is that the stationary data set is pseudope- 
riodic with noise components which are (approximately) 
identically distributed and uncorrelated for sufficiently 
large temporal translations. Note the constraints of the 
noise components in our null hypothesis are stronger than 




(c) 



I** **** j» 



20 40 60 80 100 

ndices of 100 generated surrogates 



FIG. 1: (a) Pseudoperiodic time series contaminated by obser- 
vational noise; (b) State space Xi+ n vs. x% of the pseudoperiodic 
time series from the Rossler system with n — 16; (c) Surrogate 
test for the pseudoperiodic time series based on our algorithm. 
The abscissa is the indices of 100 surrogates and the ordinate 
is the corresponding correlation dimensions. Fhe middle line 
is the mean correlation dimension of the original time series 
calculated 100 times using the GKA, the upper and lower lines 
denote the correlation dimensions twice the standard deviation 
away from the mean value and the asterisks indicate the corre- 
lation dimensions of 100 surrogates. 



that of Theiler's algorithm, which requires the noise dis- 
tribution only periodically depends on the phase of the 
signal. However, under our hypothesis, we can produce 
the surrogates in a simple way through the algorithm to 
be described below. In addition, a large scope of noise 
processes often encountered in practical situations, in- 
cluding (but not limited to) linear colored additive obser- 
vational noise described by the ARMA(p, q) model [l5|. 
match the above constraints. 

The remainder of this communication is organized as 
follows. In Sec. II we will introduce the new algorithm to 
generate pseudoperiodic surrogates, while in Sec. Ill we 
will apply this algorithm to simulation data sets from the 
Rossler system, which demonstrates the ability of the sur- 
rogate test based on this algorithm to distinguish chaotic 
orbits from pseudoperiodic ones. As one of the applica- 
tions, we will use this surrogate technique to investigate 
whether a human electrocardiogram (ECG) record is pos- 
sibly presentative of a chaotic dynamical system. Finally, 
in Sec. IV, we will have a summary of the whole com- 
munication. 



II. A NEW ALGORITHM TO GENERATE 
PSEUDOPERIODIC SURROGATES 

Let {xi}f =1 be a data set with N observations (the 
form {xi} is adopted instead for convenience when caus- 
ing no confusion), where Xi means the observation mea- 
sured at time U = i ■ At s with At s denoting the sam- 
pling time. We assume {xi}^ =1 is stationary and can be 
decomposed into the deterministic components and the 
noise components, which are approximately independent 
of each other. Similar to the surrogate test idea of time 
shifting to desynchronize two data sets [l^, we assume 
the noise components (approximately) follow an identi- 
cal distribution and are uncorrelated for sufficiently large 
temporal translations (or time shifts). According to the 
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FIG. 2: (a) Pseudoperiodic time series with both observational 
noise and dynamical noise; (b) State space Xi+ n vs. Xi of the 
pseudoperiodic time series from the Rossler system with n = 16; 
(c) Surrogate test for the pseudoperiodic time series based on 
our algorithm. The meaning of the lines and the asterisks is 
the same as that in panel (c) of Fig. 



null hypothesis we proposed in the previous section, if the 
deterministic components are periodic, then we can write 
a data point Xj, as Xi = pi + rii, where pi and rii denote 
the periodic component and the noise component respec- 
tively. In many cases, we can set E(pi) — E(yii) = 
where E is the expectation operator. Since {pi\ are 
roughly independent of {rii}, we have the autocovariance 
var(xi) — var(pi) + var(rii). Let 

y\ = ax l +(3x i+T = (ap l + (3p l+T ) + {am + f3n. l+T ) (1) 

with i = 1,2, ...,N — r, where coefficients a and (3 
satisfy a 2 + 1 = 1 and parameter r is the temporal 
translation between subsets {xi}^ =1 T and {xi+ T }^ =1 T , 
then the autocovariance function var(yj) — var(api + 
(3pi+ T ) + var(arii + j3rii +T ). Now let us consider the 
noise components. If r is sufficiently large, under our 
hypothesis, rii and n,i +T are uncorrelated. We also note 
that {fii} and {rii^ T } are drawn from (approximately) 
the same distribution, we have variant + (3rii +T ) — 
var(rii). For the deterministic component, if we re- 
quire the translation r to satisfy cov(jpi,pi+ T ) — 0, then 
var(api+ f3pi+ T ) = var(pi). Hence by choosing a suitable 
temporal translation, the noise levels of {yj}, defined by 

1/2 

(var(arii + /3ni+ T )/var(yJ)) , will be the same as that 
of {xi}^ =1 , i.e., (var(rii) / var(xi)) 1 ^ 2 . The reason to pre- 
serve the noise level is that, the presence of noise will 
affect the calculation of the correlation dimension, hence 
we would like to let the surrogates and the original time 
series (roughly) have the same noise level in order to make 
the results more conceivable. 

The above deduction leads to the central idea of our 
surrogate algorithm. From Eq. (Q, we note that if 
{pi} is periodic, the nonconstant deterministic compo- 
nents {api + f3pi+ T } shall also be periodic. In addition, 
{xi}f =1 and {yj} shall have the same noise level if a 
suitable translation r is selected. Therefore by random- 
izing the coefficient a or /3, we can generate many data 
sets {yj } as the surrogates of {xi\f =1 . Note that {pi} 
and {api + f3pi +T } have the same degree-of- freedom, if 
both of them are periodic, their correlation dimensions 
[l3f will theoretically be the same. Now let us consider 



the noise components. Although the noise components 
{atii + [3rii +T } may have a different distribution from 
that of {rii}, the noise level is preserved after the trans- 
form in Eq. (JIJ . As Diks |2(j has reported, the Gaussian 
kernel algorithm (GKA) can reasonably estimate the cor- 
relation dimensions of noisy data sets with different noise 
distributions. This implies that, under the same noise 
level, the correlation dimensions of {xi}^ =1 and {yj}, 
calculated by the GKA, shall statistically be the same 
if { x i}iLi an d {yj} are both pseudoperiodic (and sat- 
isfy the constraints we imposed). In contrast, if {pi} is 
chaotic, its linear combination, {api + /3pi+ T }, may have 
a new dynamical structure with a different correlation di- 
mension from that of {pi}, hence by adopting the corre- 
lation dimension as the discriminating statistic we might 
detect this difference. 

We shall also note that, for an unstable periodic orbit, 
even a small dynamical noise might drive the resultant or- 
bit far away from the original position after a sufficiently 
long time, and the pseudoperiodicity might be broken. In 
such situations, our algorithm might fail to work. Nev- 
ertheless, we suggest to apply our algorithm as the first 
step in pseudoperiodicity test, which is computationally 
fast and in principle deals well with a large scope of ob- 
servational noise (comparatively, the PPS algorithm will 
sometimes fail for colored observational noise) . If we can 
reject the null hypothesis proposed previously, the time 
series in test is possibly chaotic or pseudoperiodic per- 
turbed by dynamical noise. Then we can adopt the PPS 
algorithm for further tests, which works well even under 
a large amount of dynamical noise. If the corresponding 
null hypothesis, i.e., the time series is pseudoperiodic per- 
turbed by dynamical noise, can be rejected again, then 
we may claim the time series is very likely to be chaotic. 

We now consider several computational issues in our al- 
gorithms. As described in Eq. (J), to generate the surro- 
gates {yj}, we select two subsets of {xi}f =1 , {xi}fjl T and 

{xi+ T }iLi T , multiply them by the coefficients a and (3 re- 
spectively and then add them together. We shall empha- 
size that choosing the temporal translation r is a crucial 
issue for our algorithm. From one aspect, we require the 
translation r to satisfy the condition cov{pi,pi +T ) = 0. 
The reason is that we want to keep the noise level for 
the original time series and the surrogates. In addition, 
we want the deterministic components {api} to be or- 
thogonal to {(3pi +T } for arbitrary coefficients a and /3, 
otherwise the projection of {api} onto {/3pi+ T } might 
counteract {[3pi +T } under some situations, for example, 
if pi ~ —Pi+r and a = f3, the deterministic components 
{api + (3pi +T } will almost vanish while the noise compo- 
nents {arii + (3rii +T } remain. Hence the correlation di- 
mensions calculated are actually those of the noise com- 
ponents instead of the deterministic components, which 
will certainly cause the false rejection of the null hy- 
pothesis. From another aspect, we require r to be suffi- 
ciently large to guarantee the decorrelation between the 
noise components. However, we expect {xi}f =1 T and 
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FIG. 3: (a) Chaotic time series contaminated by observational 
noise ; (b) State space Xi+„ vs. Xi of the chaotic time series 
from the Rossler system with n = 16; (c) Surrogate test for the 
chaotic time series based on our new algorithm. The meaning 
of the lines and the curve is the same as that in panel (c) of 
Fig- d (d) Surrogate test for the chaotic time series based on 
the PPS algorithm. The meaning of the lines and the asterisks 
is the same as that in panel (c) of Fig. 



{xi+ T }f =1 T shall have at least some overlaps to make use 
of the information of the whole data set {xi}--,, which 
means r shall not exceed N/2. In addition, it is recom- 
mended the length of a data set shall not be too short in 
order to appropriately calculate its correlation dimension 
[l4|. which also implies r shall not be too large. 

From Eq. we see that the surrogates are gen- 



erated from two segments {xi}f =1 r and {xi+ T }iLi T of 
the original time series {xi]f =l . We want segments 
{xi\ 1 ^ = ~ l T and {xi+ T }^J[ T to equivalently affect the gen- 
eration of the surrogates, therefore we would like to let 
max(|o://3|) = max(|/3/o;|), min(|a//3|) = min(|/3/a|) and 
Pr(|a//3| ^ 1) ~ Pr(|/3/a| ^ 1), where max(-), min(-) 
and Pr(-) denote the maximal function, the minimal func- 
tion and the probability function respectively. But note 
that the coefficient ratio a//3 (or (3 /a) shall not be too 
large or too small, otherwise {yj} will be very close 
to {x{} i=1 T or {xi+ T }i-i T , which will lead to approxi- 
mately the same correlation dimensions of {x{\f =1 and 
{yj} regardless of the dynamical behavior of {xi}f =1 , 
and thus decrease the discriminating power of the cor- 
relation dimension. In our calculations we let a be uni- 
formly drawn from the interval [—0.8,-0.6] U [0.6,0.8] 
and /3 = Vl — a 2 , which satisfies our requirements and 
provides moderate values for the ratio a/ (3. 



III. SURROGATE TEST TO DISTINGUISH 
BETWEEN CHAOTIC AND PSEUDOPERIODIC 
TIME SERIES 

In this section, through four examples from the Rossler 
system, we demonstrate the ability of surrogate test 
based on our algorithm to discriminate chaotic orbits 
from pseudoperiodic ones. As an application, we will also 
employ the surrogate technique to investigate whether 
a recorded human electrocardiogram (ECG) data set is 
possibly chaotic. 



EXAMPLES 



The equations of the Rossler system are given by 



x = —y — z, 
y = x + ay, 
z = b + z(x — c). 



(2) 



with the initial conditions x(0) = y(0) = z(0) =0.1. We 
choose parameters b — 2, c = 4 and the sampling time 
At s = 0.1 time units. For each example, the system is 
to be integrated 10,000 times and the first 1,000 data 
points are discarded to avoid including transient states. 

In the first example, we set parameter a = 0.39095. 
The Rossler system exhibits limit cycle behavior of pe- 
riod 6. To obtain pseudoperiodic time series, we intro- 
duce 5% observational noise into the periodic time series. 
Although Gaussian white observational noise is the most 
common choice in this situation, in order to demonstrate 
the ability of our surrogate algorithm to deal with col- 
ored noise, we will instead adopt the noise generated from 
the AR(1) process = 0.8^ + with the vari- 

able e following the normal Gaussian distribution JV(0, 1), 
which is the more difficult case due to the correlation be- 
tween noise components. However, one shall note that, 
Gaussian white noise and other colored noises satisfying 
the constraints in our null hypothesis, for example, those 
modelled by the ARMA(p, q) processes, in principle can 
be dealt with in the same way. For convenience of obser- 
vation and comparison, we plot the time series and the 
corresponding attractor in two dimensional state space 
(or embedding space) in panels (a) and (b) of Fig. Q] 
respectively. 

To produce surrogate data, first we shall choose a suit- 
able temporal translation. Since it is impractical to sep- 
arate noise from signal completely, in general it is diffi- 
cult to estimate the correlation decay time between noise 
components. Fortunately, to decorrelate noise compo- 
nents, all temporal translations are equivalent as long as 
they are large enough. In addition, in many real sit- 
uations, it is often possible to observe the background 
noise and thus estimate the decay time. In our exam- 
ple, we think the AR(1) noise to be uncorrelated when 
the temporal translation is larger than 50 (in units of 
the sampling time At s ). As another requirement, tem- 
poral translation satisfying cov{pi^pi-i rT ) = is desired. 
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In practice, of course, this requirement is generally im- 
practical due to the digitization and quantization in sam- 
pling process. Recall the discussion in the previous sec- 
tion, by letting E{pi) — and a 2 + (3 2 — 1, we have 
var(api + I3p i+T ) = var(pi) + 2a(3 ■ cov(pi,p i+T ). Func- 
tion cov(pi,pi+ T ) ^ means we do not preserve the noise 
level. However, under the null hypothesis of pseudope- 
riodicity, there shall always be some temporal transla- 
tions to make cov(pi,pi+ T ) ~ 0, hence the noise level 
will not deviate from the original one too much. Be- 
sides, according to Eq. (JJ, we generate the surro- 
gates by uniformly drawing coefficient a from interval 
[-0.8, -0.6] U [0.6, 0.8] (/3 = Vl-a 2 is always kept pos- 
itive), the noise level of the surrogates will fluctuate 
around that of the original one due to the alternative 
signs of product a(3. Therefore, cov(pi,pi +T ) =/= will 
only cause some fluctuations when to calculate the cor- 
relation dimension because of the fluctuations of noise 
level, however, generally such fluctuations will not affect 
our conclusion if we can select a temporal translation 
r to let cov{piiPi+ T ) ~ 0. Since we have assumed the 
noise components are roughly independent of the deter- 
ministic components, then cov(xi, Xi+ T ) = cov(pi,pi +T ) 
for a large enough temporal translation (to decorrelate 
noise components), therefore in all of the examples, in 
order to let cov(j>i,pi+ T ) ~ 0, we can equivalently require 
cov (xi, Xi +T ) ~ 0. In the first example, there are many 
temporal translations satisfying the two constraints dis- 
cussed above, i.e., r > 50 and cov(xi, Xi +T ) ~ 0. To 
pick a value from all these candidates, we first select 
an interval [100,150], then search the temporal trans- 
lation which makes the absolute value \cov(Xi, Xi+ T )\ be 
the minimum (most close to zero) among all translations 
100 ^ t ^ 150. One shall note that the choice of the 
interval [100, 150] is arbitrary, except that we have to 
make sure that the lower bound of the interval is larger 
than 50, and there exists temporal translations to let 
cov(xi, Xi +T ) ~ within the interval. After selecting the 
temporal translation, by randomizing the coefficient a we 
will generate 100 surrogates according to Eq. £[J. 

In order to calculate the correlation dimension, we 
adopt the time delay embedding reconstruction |l6( to 
recover the underlying system from the scalar time se- 
ries. Two parameters, i.e., embedding dimension and 
time delay, shall be properly chosen to apply this tech- 
nique. Throughout this communication, we will use the 
false nearest neighbour criterion to determine the 
global optimal embedding dimension. Using the program 
m TISEAN package [H|, the embedding dimension m of 
the original time series is selected at 4, which is the mini- 
mal value to make the fraction of false nearest neighbours 
be zero. To choose a suitable time delay, we will use the 
algorithm of redundancy and irrelevance tradeoff expo- 
nent (RITE) proposed in j^] ■ This algorithm selects the 
time delay by searching the optimal tradeoff between re- 
dundancy (due to too small time delay) and irrelevance 
(due to too large time delay). As demonstrated, the 
RITE algorithm can select equivalently suitable time de- 
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FIG. 4: (a) Chaotic time series with both dynamical and ob- 
servational noises; (b) State space Xi+ n vs. Xi of the chaotic 
time series from the Rossler system with n = 16; (c) Surrogate 
test for the chaotic time series based on our new algorithm. 
The meaning of the lines and the asterisks is the same as that 
in panel (c) of Fig. (d) Surrogate test for the chaotic time 
series based on the PPS algorithm. The meaning of the lines 
and the asterisks is the same as that in panel (c) of Fig. 



lays compared to the average mutual information (AMI) 
criterion [Tgj . however, its implementation is much sim- 
pler and the computational cost is fairly low. Therefore 
in case of large data sets, adopting the RITE algorithm 
can facilitate our calculations. In the first example we 
generate 100 surrogates, and for each surrogate we keep 
the embedding dimension m = 4 and use the RITE al- 
gorithm to choose the suitable time delay for time delay 
reconstruction. 

We will follow Diks's method |2(j to calculate the cor- 
relation dimension, which is more robust against noise 
by extending the hard kernel function (or the Heavi- 
side function) ^| in calculation of correlation integral 
to the general kernel functions. In his discussions, Diks 
adopted the Gaussian kernel function, hence this method 
is called Gaussian kernel algorithm (GKA). Here we will 
use the GKA implemented in [2l| to calculate the corre- 
lation dimensions, which further enhances the computa- 
tional speed. Note that to speed up the calculation, only 
2000 data points are used as the reference points for the 
GKA. There are some statistical fluctuations even for the 
same data set when calculating its correlation dimension, 
therefore for the original time series, we will calculate 100 
times to estimate the mean correlation dimension and the 
standard deviation. As shown in panel (c) of Fig. ^ there 
are three lines parallel to the abscissa. The middle line 
denote the estimation of the mean correlation dimension 
of the original time series, while the upper and lower lines 
indicate the positions twice the standard deviation away 
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from the mean value. For the surrogates, however, we 
will calculate their correlation dimensions only once to 
save time. The results are illustrated as the asterisks in 
panel (c) of Fig. 

After the calculation of the correlation dimensions, we 
need to inspect whether the result is consistent with our 
null hypothesis. Here we use the ranking criterion Q 
to determine whether the null hypothesis shall be re- 
jected or not. The idea of this criterion is that, sup- 
pose the discriminating statistic of the original data set 
is Qq, and those of N$ surrogates are {Qi, Q2, Qn s }- 
Rank the statistics {Qo,Qi, ...,Qn s } in the increasing 
order and denote the rank of Qo by ro, if the data set 
is consistent with the hypothesis (i.e., no evidence to 
reject), ro can have an equal possibility be any integer 
value between 1 and Ns + 1. However, if the hypothesis 
is false, Qo might deviate from the surrogate distribu- 
tion {Qi,Q2, ■■■,Qns}> i- e ; Qo w iH b e the smallest or 
largest amongst {Qo,Qi, ...,Qat s }, hence we can reject 
the null hypothesis if ro = 1 or Ns + 1, the probability 
of a false rejection is 1/ (Ns + 1) for one-sided tests and 
2/ (N s + 1) for two-sided tests. 

For the first example, from panel (c) of Fig. \I\we can 
see that, the mean correlation dimension of the original 
time series falls within the dimension distribution of the 
surrogates, therefore we cannot reject the null hypothe- 
sis as we expect, which means the original time series is 
possibly pseudoperiodic p3 |. 

Now let us examine the other examples. In the second 
example, we still set parameter a — 0.39095 in Eq. J2J. 
However, to obtain the pseudoperiodic time series, we 
first generate a data set by adding Gaussian white noise 
with the standard deviation of 0.15% to the x compo- 
nent at each integration step, which simulates the system 
perturbed by additive dynamical noise, and then intro- 
duce 5% observational AR(\) noise into the previously 
obtained data set. The global optimal embedding dimen- 
sion is chosen at m = 4. Note in all of the four examples, 
we will generate 100 surrogates, and parameter choices 
for surrogate generation will be the same, i.e., we let the 
temporal translation be selected from [100, 150] and coef- 
ficient a be uniformly drawn from [—0.8, —0.6] U [0.6, 0.8] 
(/3 = vl — or). For the second example, the correlation 
dimensions of the original time series and the surrogates 
are shown in panel (c) of Fig. [2] Under the ranking 
criterion, once again we cannot reject our null hypothe- 
sis. Therefore the time series is possibly pseudoperiodic, 
which is consistent with our knowledge. 

In the third example, we change parameter a of Eq. 
to be 0.395. The Rossler system exhibits chaotic be- 
havior. We integrate Eq. J5J to obtain a time series 
and then introduce 5% observational AR(1) noise. The 
optimal embedding dimension m is selected at m = 5. 
From panel (c) of Fig. [31 we find that the mean correla- 
tion dimension of the original time series deviates from 
the distribution of the surrogate dimensions. Using the 
ranking criterion, we can reject our null hypothesis. In 
order to exclude the possibility that the time series is 
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FIG. 5: (a) Time series of a human electrocardiogram (ECG) 
record; (b) State space Xi+ n vs. Xi of the ECG record with 
n = 23; (c) A surrogate data generated from our algorithm 
with coefficient a = /3 = l/V% (cf. Eq. Q); (d) Surrogate test 
for the ECG record based on our algorithm. The meaning of 
the lines and the asterisks is the same as that in panel (c) of 

Fig. EJ 



generated from an unstable period orbit perturbed by 
dynamical noise, we also apply the PPS algorithm for 
further test. From the PPS algorithm we generate 100 
surrogates, and then use the GKA to calculate their cor- 
relation dimensions. The results are shown in panel (d) of 
Fig. |3 as we can see, the mean correlation dimension of 
the original time series also falls outside the distribution 
of the surrogate dimensions, therefore we can reject the 
null hypothesis again. After the two surrogate tests for 
pseudoperiodicity, we can claim the time series is chaotic 
with a confidence level up to 96% (98% x 98%) for two- 
sided test. 

The final example to be demonstrated is a chaotic time 
series also from the Rossler system. To generate the time 
series, we keep parameter a = 0.395. Similar to the way 
in the second example, we add Gaussian white noise with 
the standard deviation of 0.15% to the x component at 
each integration step as the dynamical noise, and then 
introduce 5% observational AR(\) noise into the previ- 
ously obtained data set. The global optimal embedding 
dimension is found to be m = 4. The results of surrogate 
tests based on the new algorithm and the PPS algorithm 
are shown in panel (c) and (d) of Fig. ^respectively, from 
which we can see that, surrogate tests based on both al- 
gorithms can detect the chaos in the time series. Again 
we can claim the time series is chaotic with a confidence 
level up to 96% for two-sided test. 

We have also investigated examples under different ob- 
servational noise levels (but keep the same dynamical 
noise if they have). For example, if we reduce the AR(l) 
observational noise levels to 3% in the above four exam- 
ples, we can obtain the same results as we have reported. 
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If we increase the observational noise levels to 10%, for 
the pseudoperiodic time series we can still obtain the ex- 
pected results, i.e., we cannot reject our null hypothesis. 
However, for the chaotic time series, we will falsely ac- 
cept our null hypothesis due to the correlation dimension 
of the original time series marginally falling within the 
dimension distribution of the surrogates. The reason of 
false acceptance might be that, under large noise levels, 
the correlation dimension is not sensitive enough to de- 
tect the structure changes of the chaotic time series. For 
such cases, we will have to look for more powerful dis- 
criminating statistics |24j . 

B. AN APPLICATION 

As an example of application, we employ the surrogate 
test based on our algorithm to investigate whether a hu- 
man electrocardiogram (ECG) record (with 8975 data 
points) is likely to be chaotic. The ECG record was 
obtained by measuring from a resting healthy subject 
(22 years old) in a shielded room at the sampling rate 
of 1 KHz. The ECG record indicated in panel (a) of 
Fig. [S] looks very regular and even possibly periodic, 
but we need a quantitative approach to verify the pe- 
riodicity. Here we choose the surrogate test technique. 
Using the false nearest neighbour criterion, the global 
optimal embedding dimension is chosen at m = 5. The 
background noise is mainly from the measurement in- 
struments, usually it is a blend of white and correlated 
noise. By observing the linear second order correlation 
function of the ECG data, we let the temporal translation 
be within the interval [100, 150] (large enough to decor- 
relate the noise components), where there exists an in- 
teger temporal translation to make the correlation func- 
tion almost be zero. Then by uniformly drawing values 
from [-0.8, -0.6] U [0.6, 0.8] for coefficient a in Eq. |JTJ 
{(3 = VI — a 2 ), we generate 100 surrogates. For demon- 
stration, we plot in panel (c) one surrogate generated 
from Eq. with coefficient a = /3 = 1/V2- We can see 
that the surrogate is different from the original ECG data 
in that there appear more spikes in the surrogate. How- 
ever, as we can also find, the surrogate indicates the sim- 
ilar regularity to that in the original data, which implies 
that the surrogate preserves the potential periodicity in 
the original data as we expect (although in a different 
pattern). With regards to the surrogate test, our cal- 
culation of the correlation dimensions is also based on 
the GKA. The results are indicated in panel (d) of Fig. 
[SI from which we can see that the mean correlation di- 
mension of the ECG data falls within the distribution of 



the correlation dimensions of the surrogates, therefore we 
cannot reject our null hypothesis. Hence the ECG record 
is possibly periodic. Moreover, there is no evidence of 
chaos. 

IV. CONCLUSION 

To summarize, by imposing a few constraints on the 
noise process, we devise a simple but effective way to 
produce surrogates for pseudoperiodic orbits. The main 
idea of this algorithm is that a linear combination of any 
two segments of the same periodic orbit will generate an- 
other periodic orbit. By properly choosing the temporal 
translation between the two segments, under the same 
noise level we can obtain statistically the same correla- 
tion dimensions of the pseudoperiodic orbit and its sur- 
rogates. Choosing the temporal translation is a crucial 
issue for our algorithm, which in principle shall guaran- 
tee the decorrelation between the noise components and 
preserve the noise level. Another important issue is to 
select a proper discriminating statistic which helps de- 
termine whether to reject the null hypothesis. The cor- 
relation dimension is a suitable discriminating statistic in 
this case. It is possible there are other suitable discrimi- 
nating statistics, we will leave the problem of finding such 
statistics for future study. 

The surrogate test technique based on our algorithm 
can be utilized to distinguish between chaotic and pseu- 
doperiodic time series. Initially, the PPS algorithm was 
proposed to generate surrogates for a pseudoperiodic or- 
bit driven by dynamical noise, but sometimes surrogate 
tests based on this algorithm will falsely reject the null 
hypothesis if the time series is also contaminated by col- 
ored observational noise. As a complement to the PPS 
algorithm, our algorithm deals well with observational 
noise, but it might fail for large dynamical noise. How- 
ever, due to the convenience in computation, we suggest 
to adopt surrogate test based on our algorithm as the 
first step for pseudoperiodicity detection. If we can re- 
ject the null hypothesis of our algorithm, then we shall 
use the PPS algorithm for further tests. If we can re- 
ject the null hypotheses of both the algorithms, then the 
time series under investigation is very likely to be chaotic. 
In this communication, the concrete procedures of surro- 
gate test for pseudoperiodicity are demonstrated through 
four simulation examples. As an application in practice, 
we also employ the surrogate technique based on our al- 
gorithm to investigate whether a human ECG record is 
possible to be chaotic. 
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